273 research outputs found
Temporal Relational Reasoning in Videos
Temporal relational reasoning, the ability to link meaningful transformations
of objects or entities over time, is a fundamental property of intelligent
species. In this paper, we introduce an effective and interpretable network
module, the Temporal Relation Network (TRN), designed to learn and reason about
temporal dependencies between video frames at multiple time scales. We evaluate
TRN-equipped networks on activity recognition tasks using three recent video
datasets - Something-Something, Jester, and Charades - which fundamentally
depend on temporal relational reasoning. Our results demonstrate that the
proposed TRN gives convolutional neural networks a remarkable capacity to
discover temporal relations in videos. Through only sparsely sampled video
frames, TRN-equipped networks can accurately predict human-object interactions
in the Something-Something dataset and identify various human gestures on the
Jester dataset with very competitive performance. TRN-equipped networks also
outperform two-stream networks and 3D convolution networks in recognizing daily
activities in the Charades dataset. Further analyses show that the models learn
intuitive and interpretable visual common sense knowledge in videos.Comment: camera-ready version for ECCV'1
Knowledge Distillation for Multi-task Learning
Multi-task learning (MTL) is to learn one single model that performs multiple
tasks for achieving good performance on all tasks and lower cost on
computation. Learning such a model requires to jointly optimize losses of a set
of tasks with different difficulty levels, magnitudes, and characteristics
(e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in
multi-task learning. To address the imbalance problem, we propose a knowledge
distillation based method in this work. We first learn a task-specific model
for each task. We then learn the multi-task model for minimizing task-specific
loss and for producing the same feature with task-specific models. As the
task-specific network encodes different features, we introduce small
task-specific adaptors to project multi-task features to the task-specific
features. In this way, the adaptors align the task-specific feature and the
multi-task feature, which enables a balanced parameter sharing across tasks.
Extensive experimental results demonstrate that our method can optimize a
multi-task learning model in a more balanced way and achieve better overall
performance.Comment: We propose a knowledge distillation method for addressing the
imbalance problem in multi-task learnin
Adding New Tasks to a Single Network with Weight Transformations using Binary Masks
Visual recognition algorithms are required today to exhibit adaptive
abilities. Given a deep model trained on a specific, given task, it would be
highly desirable to be able to adapt incrementally to new tasks, preserving
scalability as the number of new tasks increases, while at the same time
avoiding catastrophic forgetting issues. Recent work has shown that masking the
internal weights of a given original conv-net through learned binary variables
is a promising strategy. We build upon this intuition and take into account
more elaborated affine transformations of the convolutional weights that
include learned binary masks. We show that with our generalization it is
possible to achieve significantly higher levels of adaptation to new tasks,
enabling the approach to compete with fine tuning strategies by requiring
slightly more than 1 bit per network parameter per additional task. Experiments
on two popular benchmarks showcase the power of our approach, that achieves the
new state of the art on the Visual Decathlon Challenge
Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification
Popular approaches for few-shot classification consist of first learning a
generic data representation based on a large annotated dataset, before adapting
the representation to new classes given only a few labeled samples. In this
work, we propose a new strategy based on feature selection, which is both
simpler and more effective than previous feature adaptation approaches. First,
we obtain a multi-domain representation by training a set of semantically
different feature extractors. Then, given a few-shot learning task, we use our
multi-domain feature bank to automatically select the most relevant
representations. We show that a simple non-parametric classifier built on top
of such features produces high accuracy and generalizes to domains never seen
during training, which leads to state-of-the-art results on MetaDataset and
improved accuracy on mini-ImageNet.Comment: ECCV'2
Using AI to Enable Design for Diversity: A Perspective
Inclusive design focuses on diversity. The contextualized user-sensitive design framework of the interaction system needs to analyze and deal with complex diversity factors, which challenges the traditional design process, tools, and methods. Therefore, new technological progress is needed to provide more innovation potential. The authors point out that the design process of smart products is evolving in response to uncertainty. In the future, diversity-oriented design will tend to allocate design resources and values in an algorithmic way rather than the compromised unity solution. This paper analyzes the limitations and potential of the application of AI technology represented by deep learning in diversity-oriented design practice and design research, puts forward the goal and direction of further research, and discusses the critical links of AI-enabled diversity design in interdisciplinary research environment
BĂ©zierSketch: A Generative Model for Scalable Vector Sketches
The study of neural generative models of human sketches is a fascinating
contemporary modeling problem due to the links between sketch image generation
and the human drawing process. The landmark SketchRNN provided breakthrough by
sequentially generating sketches as a sequence of waypoints. However this leads
to low-resolution image generation, and failure to model long sketches. In this
paper we present B\'ezierSketch, a novel generative model for fully vector
sketches that are automatically scalable and high-resolution. To this end, we
first introduce a novel inverse graphics approach to stroke embedding that
trains an encoder to embed each stroke to its best fit B\'ezier curve. This
enables us to treat sketches as short sequences of paramaterized strokes and
thus train a recurrent sketch generator with greater capacity for longer
sketches, while producing scalable high-resolution results. We report
qualitative and quantitative results on the Quick, Draw! benchmark.Comment: Accepted as poster at ECCV 202
âAre Machines Better Than Humans in Image Tagging?â - A User Study Adds to the Puzzle
âDo machines perform better than humans in visual recognition
tasks?â Not so long ago, this question would have been considered
even somewhat provoking and the answer would have been clear:
âNoâ. In this paper, we present a comparison of human and machine
performance with respect to annotation for multimedia retrieval tasks.
Going beyond recent crowdsourcing studies in this respect, we also report
results of two extensive user studies. In total, 23 participants were asked
to annotate more than 1000 images of a benchmark dataset, which is the
most comprehensive study in the field so far. Krippendorffâs α is used
to measure inter-coder agreement among several coders and the results
are compared with the best machine results. The study is preceded by
a summary of studies which compared human and machine performance
in different visual and auditory recognition tasks. We discuss the results
and derive a methodology in order to compare machine performance in
multimedia annotation tasks at human level. This allows us to formally
answer the question whether a recognition problem can be considered as
solved. Finally, we are going to answer the initial question
Observation of Kuznetsov-Ma soliton dynamics in optical fibre
The nonlinear Schrödinger equation (NLSE) is a central model of nonlinear science, applying to hydrodynamics, plasma physics, molecular biology and optics. The NLSE admits only few elementary analytic solutions, but one in particular describing a localized soliton on a finite background is of intense current interest in the context of understanding the physics of extreme waves. However, although the first solution of this type was the Kuznetzov-Ma (KM) soliton derived in 1977, there have in fact been no quantitative experiments confirming its validity. We report here novel experiments in optical fibre that confirm the KM soliton theory, completing an important series of experiments that have now observed a complete family of soliton on background solutions to the NLSE. Our results also show that KM dynamics appear more universally than for the specific conditions originally considered, and can be interpreted as an analytic description of Fermi-Pasta-Ulam recurrence in NLSE propagation
- âŠ